Optimal Rewards versus Leaf-Evaluation Heuristics in Planning Agents

نویسندگان

Jonathan Sorg

Satinder P. Singh

Richard L. Lewis

چکیده

Planning agents often lack the computational resources needed to build full planning trees for their environments. Agent designers commonly overcome this finite-horizon approximation by applying an evaluation function at the leaf-states of the planning tree. Recent work has proposed an alternative approach for overcoming computational constraints on agent design: modify the reward function. In this work, we compare this reward design approach to the common leaf-evaluation heuristic approach for improving planning agents. We show that in many agents, the reward design approach strictly subsumes the leaf-evaluation approach, i.e., there exists a reward function for every leaf-evaluation heuristic that leads to equivalent behavior, but the converse is not true. We demonstrate that this generality leads to improved performance when an agent makes approximations in addition to the finite-horizon approximation. As part of our contribution, we extend PGRD, an online reward design algorithm, to develop reward design algorithms for Sparse Sampling and UCT, two algorithms capable of planning in large state spaces.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

SCALABLE PLANNING UNDER UNCERTAINTY by

Autonomous agents that act in the real-world can often improve their success by capturing the uncertainty that arises because of their imperfect knowledge and potentially faulty actions. By making plans robust to uncertainty, agents can be prepared to counteract plan failure or act upon information that becomes available during plan execution. Such robust plans are valuable, but are often diffi...

متن کامل

Logical Encodings With No Time Indexes for Defining and Computing Admissible Heuristics for Planning

A limitation of the SAT approach to planning and the more recent Weighted-SAT approach to planning with preferences is the use of logical encodings where every fluent and action must be tagged with a time index. The result is that the complexity of the encodings grows exponentially with the planning horizon, and for metrics other than makespan, the optimality achieved is conditional on the plan...

متن کامل

Cost-Optimal Planning with Landmarks

Planning landmarks are facts that must be true at some point in every solution plan. Previous work has very successfully exploited planning landmarks in satisficing (non-optimal) planning. We propose a methodology for deriving admissible heuristic estimates for cost-optimal planning from a set of planning landmarks. The resulting heuristics fall into a novel class of multi-path dependent heuris...

متن کامل

PAC optimal MDP planning with application to invasive species management

In a simulator-defined MDP, the Markovian dynamics and rewards are provided in the form of a simulator from which samples can be drawn. This paper studies MDP planning algorithms that attempt to minimize the number of simulator calls before terminating and outputting a policy that is approximately optimal with high probability. The paper introduces two heuristics for efficient exploration and a...

متن کامل